Mining Non-redundant Information-Theoretic Dependencies between Itemsets
نویسنده
چکیده
We present an information-theoretic framework for mining dependencies between itemsets in binary data. The problem of closure-based redundancy in this context is theoretically investigated, and we present both lossless and lossy pruning techniques. An efficient and scalable algorithm is proposed, which exploits the inclusion-exclusion principle for fast entropy computation. This algorithm is empirically evaluated through experiments on synthetic and real-world data.
منابع مشابه
Mining Non- Redundant Frequent Pattern in Taxonomy Datasets using Concept Lattices
In general frequent itemsets are generated from large data sets by applying various association rule mining algorithms, these produce many redundant frequent itemsets. In this paper we proposed a new framework for Non-redundant frequent itemset generation using closed frequent itemsets without lose of information on Taxonomy Datasets using concept lattices. General Terms Frequent Pattern, Assoc...
متن کاملMining Constant Conditional Functional Dependencies for Improving Data Quality
This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...
متن کاملDepth-First Non-Derivable Itemset Mining
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collecti...
متن کاملA lattice-based approach for mining most generalization association rules
Traditional association rules consist of some redundant information. Some variants based on support and confidence measures such as non-redundant rules and minimal non-redundant rules were thus proposed to reduce the redundant information. In the past, we proposed most generalization association rules (MGARs), which were more compact than (minimal) non-redundant rules in that they considered th...
متن کاملClosed Non-derivable Itemsets
Itemset mining typically results in large amounts of redundant itemsets. Several approaches such as closed itemsets, non-derivable itemsets and generators have been suggested for losslessly reducing the amount of itemsets. We propose a new pruning method based on combining techniques for closed and non-derivable itemsets that allows further reductions of itemsets. This reduction is done without...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010